19 research outputs found

    SPGP: Structure Prototype Guided Graph Pooling

    Full text link
    While graph neural networks (GNNs) have been successful for node classification tasks and link prediction tasks in graph, learning graph-level representations still remains a challenge. For the graph-level representation, it is important to learn both representation of neighboring nodes, i.e., aggregation, and graph structural information. A number of graph pooling methods have been developed for this goal. However, most of the existing pooling methods utilize k-hop neighborhood without considering explicit structural information in a graph. In this paper, we propose Structure Prototype Guided Pooling (SPGP) that utilizes prior graph structures to overcome the limitation. SPGP formulates graph structures as learnable prototype vectors and computes the affinity between nodes and prototype vectors. This leads to a novel node scoring scheme that prioritizes informative nodes while encapsulating the useful structures of the graph. Our experimental results show that SPGP outperforms state-of-the-art graph pooling methods on graph classification benchmark datasets in both accuracy and scalability.Comment: 18 pages, 6 figure

    Venn-diaNet : venn diagram based network propagation analysis framework for comparing multiple biological experiments

    Get PDF
    Background The main research topic in this paper is how to compare multiple biological experiments using transcriptome data, where each experiment is measured and designed to compare control and treated samples. Comparison of multiple biological experiments is usually performed in terms of the number of DEGs in an arbitrary combination of biological experiments. This process is usually facilitated with Venn diagram but there are several issues when Venn diagram is used to compare and analyze multiple experiments in terms of DEGs. First, current Venn diagram tools do not provide systematic analysis to prioritize genes. Because that current tools generally do not fully focus to prioritize genes, genes that are located in the segments in the Venn diagram (especially, intersection) is usually difficult to rank. Second, elucidating the phenotypic difference only with the lists of DEGs and expression values is challenging when the experimental designs have the combination of treatments. Experiment designs that aim to find the synergistic effect of the combination of treatments are very difficult to find without an informative system. Results We introduce Venn-diaNet, a Venn diagram based analysis framework that uses network propagation upon protein-protein interaction network to prioritizes genes from experiments that have multiple DEG lists. We suggest that the two issues can be effectively handled by ranking or prioritizing genes with segments of a Venn diagram. The user can easily compare multiple DEG lists with gene rankings, which is easy to understand and also can be coupled with additional analysis for their purposes. Our system provides a web-based interface to select seed genes in any of areas in a Venn diagram and then perform network propagation analysis to measure the influence of the selected seed genes in terms of ranked list of DEGs. Conclusions We suggest that our system can logically guide to select seed genes without additional prior knowledge that makes us free from the seed selection of network propagation issues. We showed that Venn-diaNet can reproduce the research findings reported in the original papers that have experiments that compare two, three and eight experiments. Venn-diaNet is freely available at: http://biohealth.snu.ac.kr/software/venndianetThis publication has been funded by (i) Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) the Ministry of Science ICT (MSIT) (No.NRF-2017M3C4A7065887), (ii) The Collaborative Genome Program for Fostering New Post-Genome Industry of the National Research Foundation (NRF), the Ministry of Science and ICT (MSIT) (No.NRF2014M3C9A3063541), and (iii) a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI) the Ministry of Health & Welfare, Republic of Korea (Grant number: HI15C3224)

    StressGenePred: a twin prediction model architecture for classifying the stress types of samples and discovering stress-related genes in arabidopsis

    Get PDF
    Background Recently, a number of studies have been conducted to investigate how plants respond to stress at the cellular molecular level by measuring gene expression profiles over time. As a result, a set of time-series gene expression data for the stress response are available in databases. With the data, an integrated analysis of multiple stresses is possible, which identifies stress-responsive genes with higher specificity because considering multiple stress can capture the effect of interference between stresses. To analyze such data, a machine learning model needs to be built. Results In this study, we developed StressGenePred, a neural network-based machine learning method, to integrate time-series transcriptome data of multiple stress types. StressGenePred is designed to detect single stress-specific biomarker genes by using a simple feature embedding method, a twin neural network model, and Confident Multiple Choice Learning (CMCL) loss. The twin neural network model consists of a biomarker gene discovery and a stress type prediction model that share the same logical layer to reduce training complexity. The CMCL loss is used to make the twin model select biomarker genes that respond specifically to a single stress. In experiments using Arabidopsis gene expression data for four major environmental stresses, such as heat, cold, salt, and drought, StressGenePred classified the types of stress more accurately than the limma feature embedding method and the support vector machine and random forest classification methods. In addition, StressGenePred discovered known stress-related genes with higher specificity than the Fisher method. Conclusions StressGenePred is a machine learning method for identifying stress-related genes and predicting stress types for an integrated analysis of multiple stress time-series transcriptome data. This method can be used to other phenotype-gene associated studies.This work and publication costs were supported by National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT (No. NRF2017M3C4A7065887), and the Collaborative Genome Program for Fostering New Post-Genome Industry of the National Research Foundation (NRF) funded by the Ministry of Science and ICT (MSIT) (No. NRF-2014M3C9A3063541). This work was supported for W.J. by the Agenda program (No. PJ014307), Rural Development of Administration of Republic of Korea

    Sparse Structure Learning via Graph Neural Networks for Inductive Document Classification

    No full text
    Recently, graph neural networks (GNNs) have been widely used for document classification. However, most existing methods are based on static word co-occurrence graphs without sentence-level information, which poses three challenges:(1) word ambiguity, (2) word synonymity, and (3) dynamic contextual dependency. To address these challenges, we propose a novel GNN-based sparse structure learning model for inductive document classification. Specifically, a document-level graph is initially generated by a disjoint union of sentence-level word co-occurrence graphs. Our model collects a set of trainable edges connecting disjoint words between sentences, and employs structure learning to sparsely select edges with dynamic contextual dependencies. Graphs with sparse structure can jointly exploit local and global contextual information in documents through GNNs. For inductive learning, the refined document graph is further fed into a general readout function for graph-level classification and optimization in an end-to-end manner. Extensive experiments on several real-world datasets demonstrate that the proposed model outperforms most state-of-the-art results, and reveal the necessity to learn sparse structures for each document

    SpliceHetero: An information theoretic approach for measuring spliceomic intratumor heterogeneity from bulk tumor RNA-seq.

    No full text
    MOTIVATION:Intratumor heterogeneity (ITH) represents the diversity of cell populations that make up cancer tissue. The level of ITH in a tumor is usually measured by a genomic variation profile, such as copy number variation and somatic mutation. However, a recent study has identified ITH at the transcriptome level and suggested that ITH at gene expression levels is useful for predicting prognosis. Measuring ITH levels at the spliceome level is a natural extension. There are serious technical challenges in measuring spliceomic ITH (sITH) from bulk tumor RNA sequencing (RNA-seq) due to the complex splicing patterns. RESULTS:We propose an information-theoretic method to measure the sITH of bulk tumors to overcome the above challenges. This method has been extensively tested in experiments using synthetic data, xenograft tumor data, and TCGA pan-cancer data. As a result, we showed that sITH is closely related to cancer progression and clonal heterogeneity, along with clinically significant features such as cancer stage, survival outcome and PAM50 subtype. As far as we know, it is the first study to define ITH at the spliceome level. This method can greatly improve the understanding of cancer spliceome and has great potential as a diagnostic and prognostic tool

    Author Correction: Subnetwork representation learning for discovering network biomarkers in predicting lymph node metastasis in early oral cancer (Scientific Reports, (2021), 11, 1, (23992), 10.1038/s41598-021-03333-5)

    No full text
    ยฉ The Author(s) 2022.In the original version of this Article, Doh Young Lee was omitted as a corresponding author. Correspondence and requests for materials should also be addressed to [email protected]

    A probabilistic model for pathway-guided gene set selection

    No full text
    ยฉ 2021 IEEE.Breast cancer is classified into five intrinsic subtypes, with differing treatment methods and prognoses. Therefore, accurate identification of subtypes from patient transcriptome data is essential. Many gene signatures, including PAM50, have been developed to classify breast cancer subtypes. However, existing gene selection methods do not utilize biological pathways. Gene signature selection using biological pathways can explain signature genes in terms of biological functions. Thus, we propose a probabilistic model for pathway-guided gene set selection using gene expression data. First, we defined gene and pathway factors based on gene expression and pathway activation levels, and calculated the posterior probability. Second, we adopted the prediction strength to guide gene set selection. Third, the gene set was selected using the posterior probability and prediction strength values. Finally, on evaluating the selected gene set, it was experimentally confirmed that our gene set performed better on classification tasks than the PAM50 gene set, a gene set produced by the XGBoost classifier, and a random gene set. Among the genes selected by our method, it was confirmed that the genes included in the cell cycle and circadian rhythm pathways showed different expression patterns for each breast cancer subtype. Our selected gene set exhibited biological significance in terms of pathway activation.N

    Risk Stratification for Breast Cancer Patient by Simultaneous Learning of Molecular Subtype and Survival Outcome Using Genetic Algorithm-Based Gene Set Selection

    No full text
    Simple Summary Patient stratification is clinically important because it allows us to understand the characteristics and establish treatment strategies for a group. Transcriptomic data play an important role in determining molecular subtypes and predicting survival. In the case of breast cancer, although the order of prognosis according to molecular subtypes is well known, there is heterogeneity even within a subtype. Therefore, patient stratification considering both molecular subtypes and survival outcomes is required. In this study, a methodology to handle this problem is presented. A genetic algorithm is used to select a set of genes, and a risk score is assigned to each patient using their expression level. According to the risk score, patients are ordered and stratified considering molecular subtypes and survival outcomes. Consequently, informative genes for patient stratification with respect to both aspects could be nominated, and the usefulness of the risk score was shown through comparison with other indicators. Patient stratification is a clinically important task because it allows us to establish and develop efficient treatment strategies for particular groups of patients. Molecular subtypes have been successfully defined using transcriptomic profiles, and they are used effectively in clinical practice, e.g., PAM50 subtypes of breast cancer. Survival prediction contributed to understanding diseases and also identifying genes related to prognosis. It is desirable to stratify patients considering these two aspects simultaneously. However, there are no methods for patient stratification that consider molecular subtypes and survival outcomes at once. Here, we propose a methodology to deal with the problem. A genetic algorithm is used to select a gene set from transcriptome data, and their expression quantities are utilized to assign a risk score to each patient. The patients are ordered and stratified according to the score. A gene set was selected by our method on a breast cancer cohort (TCGA-BRCA), and we examined its clinical utility using an independent cohort (SCAN-B). In this experiment, our method was successful in stratifying patients with respect to both molecular subtype and survival outcome. We demonstrated that the orders of patients were consistent across repeated experiments, and prognostic genes were successfully nominated. Additionally, it was observed that the risk score can be used to evaluate the molecular aggressiveness of individual patients.N
    corecore